On using high-level structured queries for integrating deep-web information sources

نویسندگان

Carlos R. Rivero

Rafael Z. Frantz

David Ruiz

Rafael Corchuelo

چکیده

The actual value of the Deep Web comes from integrating the data its applications provide. Such applications offer human-oriented search forms as their entry points, and there exists a number of tools that are used to fill them in and retrieve the resulting pages programmatically. Solution that rely on these tools are usually costly, which motivated a number of researchers to work on virtual integration, also known as metasearch. Virtual integration abstracts away from actual search forms by providing a unified search form, i.e., a programmer fills it in and the virtual integration system translates it into the application search forms. We argue that virtual integration costs might be reduced further if another abstraction level is provided by issuing structured queries in high-level languages such as SQL, XQuery or SPARQL; this helps abstract away from search forms. As far as we know, there is not a proposal in the literature that addresses this problem. In this paper, we propose a reference framework called IntegraWeb to solve the problems of using high-level structured queries to perform deep-web data integration. Furthermore, we provide a comprehensive report on existing proposals from the database integration and the Deep Web research fields, which can be used in combination to address our problem within the previous reference framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Querying Structured Information Sources Over the Web

To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the source schemas. Queries posed to the mediated schema are reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatil...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

A Semantic Approach to Integrating XML and Structured Data

XML is fast becoming the standard for information exchange on the Internet As such information expressed in XML will need to be integrated with existing information systems which are mostly based on structured data models such as relational object oriented or ob ject relational data models This paper shows how our previous framework for integrating heterogeneous structured data sources can also...

متن کامل

A Semantic Approach to IntegratingXML and Structured Data Sources CAiSE 01 ID 29

XML is fast becoming the standard for information exchange on the Internet. As such, information expressed in XML will need to be integrated with existing information systems, which are mostly based on structured data models such as relational, object-oriented or ob-ject/relational data models. This paper shows how our previous framework for integrating heterogeneous structured data sources can...

متن کامل

Deep Web Data Extraction Based on URL and Domain Classification

1 ISACA JOURNAL VOLUME 4, 2015 The rapid development of computer and networking technologies has increased the popularity of the web, which has led to the presence of more and more information on the web. However, the explosive increase of information online leads to some search problems—specifically search engines usually return too many unrelated results on a given query. Deep web is content ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

On using high-level structured queries for integrating deep-web information sources

نویسندگان

چکیده

منابع مشابه

Querying Structured Information Sources Over the Web

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

A Semantic Approach to Integrating XML and Structured Data

A Semantic Approach to IntegratingXML and Structured Data Sources CAiSE 01 ID 29

Deep Web Data Extraction Based on URL and Domain Classification

عنوان ژورنال:

اشتراک گذاری